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DETAILED ACTION 
Allowable Subject Matter 

L Claims 12, 15-17, and 19-20 would be allowable over the prior art of record if rewritten to 
include all of the limitations of the base claim and any intervening claims. The whole structure 
and interaction expressed by the combination of all limitations is not made obvious compared to 
the prior art of record for the whole invention of those dependent claims, particularly with 
assessing quality of spec or a distorted version of the speech based on comparing articulation 
power and non-articulation power and comparing quahty of the spec and a distorted version of the 
speech. Certain assumptions that make the hmitations clear have been considered for the claims, 
as described next or elsewhere in this Office action. The claims should also be rewritten to 
overcome any objections or rejections under 35 U.S.C. 1 12(2), especially as appearing in this 
Office action. 

Oath/Declaration 

2. The replacement declaration submitted by the Applicant was received on October 3, 2003, 
and this declaration is substantively acceptable to the Examiner. 

Information Disclosure Statement 

3. A copy of the search report of the European Patent Office (04253532.8-22 18-) (received 
January 28, 2005) is present. The search report and its cited documents have been considered by 
the Examiner. 
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4. The information disclosure statement filed January 28, 2005 seems to incorrectly indicate 
that an English translation was provided. Only one (of two) abstract of the WO 02/4305 1 
provided is in Enghsh. The Examiner has changed citations on the disclosure statement to 
correspond to the copies provided. If the changes are acceptable to the Applicant, no action is 
required. Further submissions should comply with 37 CFR 1.97 and 37 CFR 1.98 as of the date of 
their submission. 

Drawings 

5. The proposed substitute drawings (5 sheet(s), received October 2, 2003) are present and 
are now the Figs. 1-6 of record. These drawing sheets are substantively acceptable to the 
Examiner. 

Specification 

6. A substitute specification including the claims, abstract, and title is required pursuant to 37 
CFR 1.125(a) because the nature of the specification currently of record renders it difficult to 
consider the application. Copies of the Office record of originally filed specification, claims, and 
abstract are included with this Office action; the difficulty should be apparent. See also US Patent 
Application Publication 2004/0267523, which purports to be a publication of this application; note 
that there are differences. 

A substitute specification must not contain new matter. The substitute specification must 
be submitted with markings showing all the changes relative to the immediate prior version of the 
specification of record. The text of any added subject matter must be shown by underlining the 
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added text. The text of any deleted matter must be shown by strikethrough except that double 
brackets placed before and after the deleted characters may be used to show deletion of five or 
fewer consecutive characters. The text of any deleted subject matter must be shown by being 
placed within double brackets if strikethrough cannot be easily perceived. An accompanying clean 
version (without markings) and a statement that the substitute specification contains no new matter 
must also be supplied. Numbering the paragraphs of the specification of record is not considered a 
change that must be shown. 

The copies of the originally filed specification, claims, and abstract that are included with 
this Office action might make it easier for the Applicant to comply with requirements for markings 
and decisions regarding new matter. 

Note elsewhere in this Office action and in US Patent Application Publication 
2004/0267523 that the proposed replacement drawings are now of record and this Office action 
does not require that they be provided again in this application. 

Claim Informalities 

7. Claim 1, and by dependency claims 2-20, are objected to imder 37 CFR 1.75(a) because 
the meaning of the phrase "a first and second speech signal" (page 25, lines 5-6) needs 
clarification. This phrase recites only one signal. However, in order to provide antecedence for 
each of "the first speech signal" and "the second speech signal", the Examiner has interpreted this 
phase as —first and second speech signals—. 
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8. Claim 1, and by dependency claims 2-20, are objected to under 37 CFR 1.75(a) because 
the meaning of the phrase "the first and second speech qualities" (page 25, line 8) needs 
clarification. Because no speech qualities were previously recited, it may be unclear as to what 
element this phrase refers. There may be confusion with "first and second speech signals", 
because each signal might somehow have quality, or with the singular assessment, because the 
quality assessment might somehow have quality. To further timely prosecution and evaluate prior 
art, the Examiner has interpreted this phase as —first and second speech qualities—. 

9. Claim 2 is objected to under 37 CFR 1 .75(a) because the meaning of the phrase "the 
additional steps" (page 25, line 11) needs clarification. Because only one additional step is recited 
by claim 2, it may be unclear as to what element this phrase refers. To further timely prosecution 
and evaluate prior art, the Examiner has interpreted this phase as —the additional step—. 

10. Claim 2 is objected to under 37 CFR 1.75(a) because the meaning of the phrase "the first 
and second speech quahty assessments" (page 25, line 12) needs clarification. Because only one 
assessment was previously recited, it may be unclear as to what element this phrase refers. To 
further timely prosecution and evaluate prior art, the Examiner has interpreted this phase as —the 
first and second speech quality assessment—. 

However, the Examiner believes that the inventions of both claim 1 and claim 2 might 
more closely reflect the invention disclosed in the specification and in Fig. 1 if claim 1 were 
amended to recite "first and second speech quality assessments", rather than the Examiner's 
assumption of a single assessment for claim 2. Note also "the step assessing" in claim 7 and "the 
step of assessing" in claim 14. 

1 1 . Claim 7, and by dependency claims 8-20, are objected to under 37 CFR 1 .75(a) because 
the meaning of the phrase "the step assessing the second or first speech quality" (page 25, lines 
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27-28) needs clarification. Because no step assessing was previously recited, it may be unclear as 
to what element this phrase refers. To further timely prosecution and evaluate prior art, the 
Examiner has interpreted this phase as —the step of determining-. Note also "the step of 
assessing" in claim 14. 

12. Claim 7, and by dependency claims 8-20, are objected to under 37 CFR 1.75(a) because 
the meaning of the phrases "the speech signal or distorted speech signal" (page 25, lines 29-30 and 
page 26, lines 1-2) needs clarification. Because two speech signals and no distorted speech signal 
were previously recited, it may be unclear as to what element this phrase refers. To further timely 
prosecution and evaluate prior art, the Examiner has interpreted this phase as —the second speech 
signal or the distorted version—. 

13. Claim 7, and by dependency claims 8-20, are objected to under 37 CFR 1 .75(a) because 
the meaning of the phrase "the comparison" (page 26, line 3) needs clarification. Because both no 
comparison was previously recited, but both comparing qualities and comparing powers were 
previously recited, it may be unclear as to what element this phrase refers. To further timely 
prosecution and evaluate prior art, the Examiner has interpreted this phase as -the comparison of 
the powers—. Note also "the comparison" in claim 14. 

14. Claim 12 is objected to under 37 CFR 1.75(a) because the meaning of the phrase "the 
ratio" (page 26, line 1 8) needs clarification. Because no ratio was previously recited, it may be 
unclear as to what element this phrase refers. To further timely prosecution and evaluate prior art, 
the Examiner has interpreted this phase as —a ratio—. Note that "a ratio" appears in claim 11, but 
this claim is not dependent to claim 1 1 . 
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15. Claim 14 is objected to under 37 CFR 1.75(a) because the meaning of the phrase "the 
comparison" (page 26, Hne 28) needs clarification. Because all (three) of a comparison, 
comparing qualities, and comparing powers were previously recited, it may be unclear as to what 
element this phrase refers. To further timely prosecution and evaluate prior art, the Examiner has 
interpreted this phase as —the comparison of the powers—. 

16. . Claim 15 is objected to under 37 CFR 1.75(a) because the meaning of the phrase "the local 
speech quality** (page 26, line 30) needs clarification. Because no local speech quality was 
previously recited, it may be unclear as to what element this phrase refers. To fiirther timely 
prosecution and evaluate prior art, the Examiner has interpreted this phase as —a local speech 
quality—. Note that "a local speech quality" appears in claim 14, but this claim is not dependent to 
claim 14. The Applicant also may wish to consider whether "further" is a proper adverb for 
"determined" in view of whatever amendments may be made to this claim. 

17. Claim 16 is objected to for the same reasons as claim 15 because the limitations are recited 
using obviously similar phrases. 

18. Claim 18, and by dependency claims 19-20, are objected to imder 37 CFR 1.75(a) because 
the meaning of the phrase "the speech signal" (page 27, line 12) needs clarification. Because two 
speech signals were previously recited, it may be unclear as to what element this phrase refers. To 
further timely prosecution and evaluate prior art, the Examiner has interpreted this phase as -the 
second speech signal or the distorted version-. 

19. Claim 20 is objected to imder 37 CFR 1.75(a) because the meaning of the phrase "the 
plurality of modulation spectrums" (page 27, lines 21-22) needs clarification. Because no 
plurality of modulation spectrums was previously recited, it may be unclear as to what element 
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this phrase refers. To further timely prosecution and evaluate prior art, the Examiner has 
interpreted this phase as —a plurality of modulation spectrums— . Note that "a plurality of 
modulation spectrums" appears in claim 19, but this claim is not dependent to claim 19. 

20. The Examiner notes, without objection, the possibiUty of informalities in the claims. The 
Applicant may wish to consider changes during normal review and revision of the disclosure. 

hi claim 7 (page 26, lines 2-3), should the phrase "and and" be --and--? 

Claim Rejections - 35 USC §102 

21. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the 
basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public use or on sale 
in this country, more than one year prior to the date of application for patent in the United States. 

HolHer 

22. Claims 1-6 are rejected under 35 U.S.C. 102(b) as being anticipated by HoUier et aL [US 
Patent 5,799,133]. 

23. Regarding claim 1, Hollier [at column 6, lines 36-51] describes a method of assessing 
speech by describing the content and functionality of the recited limitations recognizable as a 
whole to one versed in the art as the following terminology: 

determining a first and second assessment(s) for first and second signals [at column 9, 
lines 14-19, as generate two measures corresponding to the "good" sample and the distorted 
sample]; 
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comparing first and second to obtain a compensated assessment [at column 9, lines 15-18, 
as two measures corresponding to the "good" sample and the distorted sample are then compared 
to produce an error surface]; 

the first signal being a distorted version of the second signal [at column 8, lines 44-48, as 
the corresponding second version has distortion of the "good" signal sample]; 

the signals are speech signals [at column 8, lines 37-48, as speech segments are to be 
assessed]; 

assessments and compensated assessment are speech quality assessments [at column 6, 
lines 36-39, as the model and parameters are for the purpose of measuring signal quality, i.e. the 
perceived speech quality] . 

24. Regarding claim 2, Hollier also describes: 

prior to the determining [at column 10, lines 42-52, as the source of the distorted signal 
may be supphed from a store that is pre-generated]; 

distorting the second signal to produce the first signal [at column 8, lines 44-48, as the 
corresponding second version has distortion of the "good" signal sample]. 

25. Regarding claim 3, Hollier also describes: 

the qualities are assessed using an identical technique for objective assessment [at 
column 8, line 63-column 9, line 1, as the "good" and distorted samples have the same model 
process applied]. 

26. Regarding claim 4, Hollier also describes: 

the compensated assessment corresponds to a difference between the qualities [at 
column 9, lines 29-32, as values on the error surface are determined as the difference]. 
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27. Regarding claim 5, HoUier also describes: 

the compensated assessment corresponds to a ratio between the qualities [at column 9, 
lines 54-67, as the error entropy of the distortion (the error distribution) corresponds to the 
logarithm of the magnitude of the error value (numerator) divided by the error energy distribution 
(reciprocal, denominator)]. 

28. Regarding claim 6, Hollier also describes: 

the qualities are assessed using auditory-articulatory analysis [at column 6, lines 26-38, as 
the method for estimating parameters relies on audible features for auditory models and a vocal 
tract model of how sounds are produced to recognize inaudible differences]. 

Claim Rejections - 35 USC §103 

29. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness 
rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

Hollier and Rothenbers 

30. Claims 7-10, 13-14, and 18 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
HoUier et al. [US Patent 5,799,133] in view o fRothenberg [US Patent 5,454,375]. 

3 1 . Regarding claim 7, Hollier describes the included claim elements by dependency as 
indicated elsewhere in this Office action. Hollier [at column 2, lines 1-5] also describes that the 
spectral content of speech can be used to analyze other characteristics that are not directly 
observable. 
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However, Hollier does not explicitly describe comparing articulation power and non- 
articulation power, wherein articulation and non-articulation powers are associated with 
articulation and non-articulation frequencies of the signals, and assessing speech quality based on 
their comparison. 

Like Hollier, Rothenberg [at column 1 ] describes performing auditory-articulatory analysis 
using speech spectra, and Rothenberg describes the content and functionality of the recited 
limitations recognizable as a whole to one versed in the art as the following terminology: 

power for the speech signal (of distorted version) [at column 6, lines 13-39, as volume 
velocity of airflow pressure for voice across the impedance of a membrane, measured in some 
convenient form]; 

articulation power and non-articulation power associated with articulation and non- 
articulation frequencies of the speech signal (or distorted version) [at column 1, Unes 36-53, as 
airflow restricted to low, principally subsonic frequencies, below about 20 Hz that measures 
articulatory pattems of the voice source and airflow making up the voice to at least about 1000 Hz 
at higher, acoustic frequencies]; 

comparing them [at column 7, lines 26-29, as sufficiently such that a value of the high 
frequency energy value is less than a value of the low frequency energy ]; 

and assessing the second speech quality (or first speech quality) based on their comparison 
[at column 2, lines 21-31, as the quality of the sound produced in the nature of the voice, including 
a change in the airflow variables being measured]. 

As indicated, Rothenberg shows that comparing articulation power and non-articulation 
power, wherein articulation and non-articulation powers are associated with articulation and non- 
articulation frequencies of the signals, and assessing speech quality based on their comparison was 
known to artisans at the time of invention. Since Hollier suggests also modeling inaudible 
characteristics for speech quality assessment and Rothenberg [at column 2, lines 21-25] also 
points out that changes in the vocal tract acoustics can distort the voice, it would have been 
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obvious to one of ordinary skill in the art of auditory articulatory analysis at the time of invention 
to include the concepts described by Rothenberg. at least including comparing articulation power 
and non-articulation power, wherein articulation and non-articulation powers are associated with 
articulation and non-articulation frequencies of the signals, and assessing speech quality based on 
their comparison to derive parameters and use HolHer's vocal tract model or spectral parameters 
because Rothenberg's articulatory analysis and parameters would provide the advantage of 
specifying Hollier 's models for inaudible characteristics with particular relevance to speech 
quality. 

32. Regarding claim 8, Rothenberg also describes: 

the articulation frequencies are approximately 2-^12.5 Hz [at column 1, lines 36-40, as 
airflow restricted to low, principally subsonic frequencies, below about 20 Hz that measures 
articulatory pattems of the voice source], 

33. Regarding claim 9, Rothenberg also describes: 

the articulation frequencies correspond approximately to a speed of human articulation [at 
column 1, lines 36-40, as airflow restricted to low, principally subsonic frequencies, below about 
20 Hz may determine the rate of lung deflation]. 

34. Regarding claim 10, Rothenberg also describes: 

the non-articulation frequencies are approximately greater than the articulation frequencies 
[at column 1 , lines 36-54, as airflow restricted to low, principally subsonic frequencies, below 
about 20 Hz that measures articulatory pattems of the voice source and airflow making up the 
voice (well above 20 Hz)]. 
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3 5 . Regarding claim 1 3 , Rothenberg also describes : 

the comparison is a difference between the articulation power and the non-articulation 
power [at column 6, lines 25-28, as the impedance that the membrane provides is low impedance 
for audible frequencies and high impedance for low-frequency components]. 

36. Regarding claim 14, Rothenberg also describes: 

the determined speech quality is local [at column 2, lines 26-31, as the quality of the voice, 
being produced may have a distorted perception for the speaker]. 

37. Regarding claim 18, Hollier also describes: 

filtering the speech signal to obtain a plurality of critical band signals [at column 9, 
lines 21-22, as conform the signal to the Bark scale]. 

Hollier and Rothenber2 and Bell 

38. Claim 1 1 is rejected under 35 U.S.C. 103(a) as being impatentable ove r Hollier et al. [US 
Patent 5,799,133] in view o f Rothenberg [US Patent 5,454,375] and BelL Jr. et al [US Patent 
3,971,034]. 

39. Regarding claim 1 1, Hollier and Rothenberg describe and make obvious the included claim 
elements by dependency as indicated elsewhere in this Office action. Rothenberg [at column 6, 
lines 13-39] also describes that the volume velocity of airflow pressure for voice, measured in 
some convenient form, provides a convenient measure of inaudible power of speech. 

However, Rothenberg does not discuss the details of making the comparison of articulatory 
energy to non-articulatory energy. In particular, neither Hollier nor Rothenberg explicitly 
describes that the comparison is a ratio between articulation power and non-articulation power. 
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Like Rothenberg, Bell [at column 4] discusses the subsonic frequencies that are present in 
articulated speech. Although Bell does not use the comparison, Bell describes a comparison of 
airflow, as follows: 

the comparison is a ratio [at column 2, lines 47-48, as inspiration-expiration ratio]. 
As indicated, Bell shows that a comparison ratio of airflow during speech was known to 
artisans at the time of invention. The system by Rothenberg requires a comparison of airflow 
values, but merely any comparison from mature technologies to complete HoUier's inaudible- 
spectrum model. Rothenberg has not disclosed a preferred approach to the comparison according 
to a design criterion or solution to any stated problem. Since it appears that the use of any 
comparison of values that is known to artisans would perform to provide Rothenberg 's 
comparison of articulatory power to non-articulatory power. Rothenberg ' s application of a 
comparison of airflow would suggest finding a comparison frinction used in a similar area, which 
Bell describes as known to artisans for airflow comparison. It would have been obvious to one of 
ordinary skill in the art of airflow measurement at the time of invention to include the concepts 
described by Bell, at least a ratio comparison, because that would provide the comparison of 
airflow values with which Rothenberg ' s analysis operates to measure articulatory power and non- 
articulatory power. 



Double Patenting 

Application No. 10/186.862 

40. A rejection based on double patenting of the "same invention" type finds its support in the 
language of 35 U.S.C. 101 which states that "whoever invents or discovers any new and useful 
process ... may obtain a patent therefor ..." (Emphasis added). Thus, the term "same invention," 
in this context, means an invention drawn to identical subject matter. See Miller v. Eagle Mfg. 
Co., 151 U.S. 186 (1894); In re Ockert, 245 F.2d 467, 1 14 USPQ 330 (CCPA 1957); and In re 
Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970). 
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A statutory type (35 U.S.C. 101) double patenting rejection can be overcome by canceling 
or amending the conflicting claims so they are no longer coextensive in scope. The filing of a 
terminal disclaimer cannot overcome a double patenting rejection based upon 35 U.S.C. 101. 

41. Claims 1-20 are provisionally rejected under 35 U.S.C. 101 as claiming the same invention 
as that of claims 10-20 of copending Application No. 10/186,862, which has at least one applicant 
in common with the instant application. This is a provisional double patenting rejection since the 
conflicting claims have not in fact been patented. Application No. 10/186,862 has been published 
as US Patent Application Publication 2004/0002857. 

ApDlication Number 20/186.840 andHolUer 

42. The nonstatutory double patenting rejection is based on a judicially created doctrine 
grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or 
improper timewise extension of the "right to exclude" granted by a patent and to prevent possible 
harassment by multiple assignees. A nonstatutory obviousness-type double patenting rejection is 
appropriate where the conflicting claims are not identical, but at least one examined application 
claim is not patentably distinct from the reference claim(s) because the examined application 
claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g.. 
In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 
USPQ2d 2010 (Fed. Cir. 1993); In reLongi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re 
Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, All F.2d 438, 164 
USPQ 619 (CCPA 1970); and/« re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969). 

A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) maybe 
used to overcome an actual or provisional rejection based on a nonstatutory double patenting 
ground, AND provided the conflicting application or patent is shown to be commonly owned with 



Application/Control Number: J 0/603,2 J 2 Page 16 

Art Unit: 2654 

this application or claims an invention made as a result of activities undertaken within the scope of 
a joint research agreement See 37 CFR 1.130(b). 

Effective January 1, 1994, a registered attorney or agent of record may sign a terminal 
disclaimer. A terminal disclaimer signed by the assignee must fully comply with 37 CFR 3.73(b). 

43. Claims 7-11, 13-15, and 17-20 are provisionally rejected on the ground of nonstatutory, 
obviousness-type double patenting as being unpatentable over claims 1-16 of copending 
Application Number 10/186,840, which has at least one applicant in common with the instant 
application, in view of Hollier et al. [US Patent 5,799,133]. Although the conflicting claims are 
not identical, they are not patentably distinct from each other because a person of ordinary skill in 
the art would conclude that the invention defined in the claims in issue is an obvious variation of 
the invention defined in the claims in Application Number 10/186,840. 

This is a provisional obviousness-type double patenting rejection because the conflicting 
claims have not in fact been patented. AppUcation Number 10/186,840 has been pubUshed as US 
Patent Application Publication 2004/0002852. 

44. Regarding claim 7, Application Number 10/186,840 claims the recited limitations 
recognizable as a whole to one versed in the art as the following terminology in claims 1-16: 

comparing articulation power and non-articulation power for the speech signal (or distorted 
speech signal) [at claim 1 , as comparing articulation power and non-articulation power for a 
speech signal]; 

wherein articulation and non-articulation powers are powers associated with articulation 
and non-articulation frequencies of the speech signal (or distorted speech signal) [at claim 1 , as 
wherein articulation and non-articulation powers are powers associated with articulation and non- 
articulation frequencies of the speech signal]; 
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and assessing the second speech quality (or first speech quality) based on the comparison 
[at claim 1, as assessing speech quality based on the comparison]. 

However, Application Number 10/186,840 does not explicitly claim determining a first 
and second speech quality assessment for a first and second speech signal, the first speech signal 
being a distorted version of the second speech signal; and comparing the first and second speech 
qualities to obtain a compensated speech quality assessment, as recited in this application's claim 
1. 

Like Application Number 10/186,840, HolUer [at column 6, lines 28-41] uses inaudible 
vocal tract differences to characterize and assess speech quality, and Hollier describes: 

determining a first and second assessment(s) for first and second signals [at column 9, 
lines 14-19, as generate two measures corresponding to the "good" sample and the distorted 
sample]; 

comparing first and second to obtain a compensated assessment [at column 9, lines 15-18, 
as two measures corresponding to the "good" sample and the distorted sample are then compared 
to produce an error surface]; 

the first signal being a distorted version of the second signal [at colimm 8, lines 44-48, as 
the corresponding second version has distortion of the "good" signal sample]; 

the signals are speech signals [at column 8, lines 37-48, as speech segments are to be 
assessed]; 

assessments and compensated assessment are speech quality assessments [at column 6, 
lines 36-39, as the model and parameters are for the purpose of measuring signal quality, i.e. the 
perceived speech quality]. 

As indicated, Hollier shows that determining a first and second speech quality assessment 
for a first and second speech signal, the first speech signal being a distorted version of the second 
speech signal; and comparing the first and second speech qualities to obtain a compensated speech 
quality assessment was known to artisans at the time of invention. Since Hollier also suggests 
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modeling inaudible characteristics for speech quality assessnnent, and Hollier [at column 7, lines 
47-55] also points out that the comparison between quaUty of a speech signal and its distorted 
version will define a definition fimction that allows derivation of the quahty of signals received for 
communication channels, it would have been obvious to one of ordinary skill in the art of 
articulation and non-articulation fi-equencies and powers of speech at the time of invention to 
include the concepts described by Hollien at least including determining a first and second speech 
quality assessment for a first and second speech signal, the first speech signal being a distorted 
version of the second speech signal; and comparing the first and second speech qualities to obtain 
a compensated speech quality assessment with the claimed comparison of articulation and non- 
articulation powers for a speech signal and making the assessment based on their comparison, 
because the quality of signals can then be measured by a fimction derived firom the comparison. 

Claims 2-16 of Application Number 10/186,840 set forth additional limitations that claim 7 
of this apphcation does not explicitly include. 

It would have been obvious to one of ordinary skill in the art of computerized speech 
recognition at the time that the invention was made that the additional claim limitations in 
Apphcation Number 10/186,840's claims 2-16 differ from those in this application's claim 7 only 
by fimctions that can be eliminated if the effect of the additional fimctions is imneeded or 
undesired. If the fimctionality provided by the additional limitations were not desired, it would 
have been obvious to eliminate it, and so achieve the advantage of simplifying the processing. 

45. Regarding claim 8, Apphcation Number 10/186,840 also claims the additional limitations 
as the additional limitations of dependent claim 2. 

46. Regarding claim 9, Apphcation Nimiber 10/186,840 also claims the additional limitations 
as the additional limitations of dependent claim 3. 
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47. Regarding claim 10, Application Number 10/186,840 also claims the additional limitations 
as the additional limitations of dependent claim 4. 

48. Regarding claim 11, AppUcation Number 10/186,840 also claims the additional limitations 
as the additional limitations of dependent claim 5, and claim 6 by dependency. Claim 6 of 
Apphcation Number 10/186,840 sets forth additional limitations that claim 1 1 of this application 
does not explicitly include. 

It would have been obvious to one of ordinary skill in the art of computerized speech 
recognition at the time that the invention was made that the additional claim limitations in 
Application Number 10/186,840*s claim 6 differ from those in this application's claim 1 1 only by 
functions that can be eliminated if the effect of the additional functions is unneeded or undesired. 
If the functionality provided by the additional limitations were not desired, it would have been 
obvious to eliminate it, and so achieve the advantage of simplifying the processing. 

49. Regarding claim 13, Application Number 10/186,840 also claims the additional limitations 
as the additional limitations of dependent claim 7. 

50. Regarding claim 14, Application Number 10/186,840 also claims the additional limitations 
as the additional limitations of dependent claim 8. 

51. Regarding claim 15, Application Number 10/186,840 also claims the additional limitations 
as the additional limitations of dependent claim 9, and claims 10-1 1 by dependency. Claims 10-11 
of Application Nimiber 10/186,840 set forth additional limitations that claim 15 of this application 
does not explicitly include. 

It would have been obvious to one of ordinary skill in the art of computerized speech 
recognition at the time that the invention was made that the additional claim limitations in 
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Application Number 10/186,840's claims 10-1 1 differ from those in this application's claim 15 
only by functions that can be eliminated if the effect of the additional functions is unneeded or 
undesired. If the functionality provided by the additional limitations were not desired, it would 
have been obvious to eliminate it, and so achieve the advantage of simplifying the processing. 

52. Regarding claim 17, Apphcation Number 10/186,840 also claims the additional limitations 
as the additional limitations of dependent claim 13, and claims 14-16 by dependency. Claims 12 
and 14-16 of Application Number 10/186,840 set forth additional limitations that claim 17 of this 
application does not explicitly include. 

It would have been obvious to one of ordinary skill in the art of computerized speech 
recognition at the time that the invention was made that the additional claim limitations in 
Application Number 10/186,840's claims 13-16 differ from those in this application's claim 17 
only by functions that can be eliminated if the effect of the additional functions is unneeded or 
undesired. If the functionality provided by the additional limitations were not desired, it would 
have been obvious to eliminate it, and so achieve the advantage of simplifying the processing. 

53. Regarding claim 18, Application Number 10/186,840 also claims the additional limitations 
as the additional limitations of dependent claim 14, and claims 15-16 by dependency. Claims 12 
and 15-16 of Apphcation Number 10/186,840 set forth additional limitations that claim 18 of this 
application does not explicitly include. 

It would have been obvious to one of ordinary skill in the art of computerized speech 
recognition at the time that the invention was made that the additional claim limitations in 
Application Number 10/186,840's claims 15-16 differ from those in this application's claim 18 
only by functions that can be eliminated if the effect of the additional functions is unneeded or 
undesired. If the functionality provided by the additional limitations were not desired, it would 
have been obvious to eliminate it, and so achieve the advantage of simplifying the processing. 
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54. Regarding claim 19, Application Number 10/186,840 also claims the additional limitations 
as the additional limitations of dependent claim 15, and claim 16 by dependency. Claims 12 and 
16 of Application Number 10/186,840 set forth additional limitations that claim 19 of this 
application does not explicitly include. 

It would have been obvious to one of ordinary skill in the art of computerized speech 
recognition at the time that the invention was made that the additional claim limitations in 
Application Number 1 0/1 86,840 's claim 16 differ from those in this application's claim 19 only 
by functions that can be eliminated if the effect of the additional fimctions is unneeded or 
undesired. If the functionality provided by the additional limitations were not desired, it would 
have been obvious to eliminate it, and so achieve the advantage of simplifying the processing. 

55. Regarding claim 20, Apphcation Number 10/186,840 also claims the additional limitations 
as the additional limitations of dependent claim 16. Claims 12 and 15 of Application Number 
10/186,840 set forth additional limitations that claim 20 of this application does not explicitly 
include. 

It would have been obvious to one of ordinary skill in the art of computerized speech 
recognition at the time that the invention was made that the additional claim limitations in 
Apphcation Number 10/1 86,840' s claim 16 differ from those in this apphcation' s claim 20 only 
by functions that can be eliminated if the effect of the additional functions is uimeeded or 
undesired. If the functionality provided by the additional limitations were not desired, it would 
have been obvious to eliminate it, and so achieve the advantage of simplifying the processing. 
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Conclusion 

56. The following references here made of record are considered pertinent to applicant's 
disclosure: 

Parra [US Patent 5,313,556] describes comparisons of speech features in the infrasonic range of 

speech to identify and/or confirm the speaker. 
HoUier et al. [US Patent 6,035,270] describes speech features due to inaudible vocal tract 

differences to characterize speech quality. 
Hogden [US Patent 6,052,662] describes speech reconstruction using articulator models at 

subsonic frequencies to provide higher quality reconstructed speech. 
Hardy [US Patent 6,246,978] describes choosing and calculating measures that quantify distortion 

of speech that could not have been articulated in natural speech. 
Ghitza et al. [US Patent 6,609,092] describes a distortion measure between two auditory 

representations of source and processed speech. 

57. Any response to this action should be mailed to: 

Mail Stop Amendment 

Commissioner for Patents 
P.O. Box 1450 
Alexandria, VA 22313-1450 

or faxed to: 

(571) 273-8300, (for both formal communications intended for entry and for 
informal or draft conmiunications, but please label informal fax as "PROPOSED" 
or "DRAFT") 

Patent Correspondence delivered by hand or delivery services, other than the USPS, should 
be addressed as follows and brought to U.S. Patent and Trademark Office, Customer 
Service Window, Mail Stop Amendment, Randolph Building, 401 Dulany Street, 
Alexandria, VA 22314 



Applica noN/CoNTROL NUMBER: 10/603,212 Page 23 

Art Unit: 2654 

58. Any inquiry concerning this communication or earlier communications from the examiner 

should be directed to Donald L. Storm, of Art Unit 2654, whose telephone number is 

(571) 272-7614. The examiner can normally be reached on weekdays between 7:00 AM and 3:30 

PM Eastem Time. If attempts to reach the examiner by telephone are unsuccessful, the 

examiner's supervisor, Richemond Dorvil can be reached on (571) 272-7602. 

Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Inquiries regarding the status of submissions 
relating to an application or questions on the Private PAIR system should be directed to the 
Electronic Business Center (EBC) at 866-217-9197 (toll-free) or 571-272-4100 between the hours 
of 6 a.m. and midnight Monday through Friday EST, or by e-mail at: ebc@uspto.gov. For general 
information about the PAIR system, see http://pair-direct.uspto.gov. 



Donald L. Storm 

March 3, 2006 Examiner, Art Unit 2654 
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' ' ' * * • * 

• ' " * * ■ 

Abstract of the Disclosure 

A inetlK>d for objective speech quality asse^ 
phonetic coptents, spealcmg styles or individual spe^er differences by distorting speech 
signals imder speech quality assisssment By using a distorted version of a speech signal, 
5 it is possible to con^pensate for diffmnt phonetic contents, different individual speakers 
and differdit speaking styles when aissessin^ 

in the objective speech quality assessinent by distorting the speech signal is maintained 

. . • • • . , * ■ - 

similarly for different speech signals, especially when the amount of distortion of the 
distorted version of speech signal is severe. Objective speech quality asi56ssm«it for the 
10 distorted speech signal and the original iindisto^ed speech signal are compared to obtain 
a speeph quaUty assessment compensated fp . 
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Claims 
Iclaka: 

1 ; A method of assessiiig speech quality comprising the steps of: 

detemiining a first and second speech quaUty asse^^ 

second speech signal, the fifst speech signal being a distorted version of the 

• • . • 
second speech sig^; and 

comparing the first and second speech qualities to 
speech quality a$ses$meiit 

■ ^ • 

* 

2. The method of claim 1 comprising the additional steps of 

prior to detehnining the first and second speech qi^ 
distorting the second speech signal to produce the first speech signal. 

3. The inediod of claim .1, wherein the first and second speech qualities are asseissed 

using 90 identical tecl^iique for objective speech quality assessment 

. . . • . • _ • • ■ 

4: The method of claim 1 , wherein the compensated speech quality assessmmt 
corresponds to a difference.between the first and second q[>eedL qualities. 

5. Themethodofclaim l» wherein the compensated speech quality . 
corresponds to a ratio between the first wd seco^ 

■ * * 

* • * • 

< * • 

6. . ' The niethod of claim 1, wherein the first and second speech q[ualities are assessed 

using auxUtory^articulatory analysis, 

.7. The method of claim 1 , wherein the step assessing the second or first speech 
quality comprises the steps of; 

comparing articulation power and non*articulation power for the speech 
signal or distorted speech signal* wherein articulation and non-articulation powers 
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are. powers associated with articulation and non-articulatipn frequencies of ^e 
speech signal or distorted speech signal; and 

and assessing the second or first speech quality based on the comparison. 



8. 



The method of claim 7; wherein the articulation frequencies are approximately 

i-ixs Hz. 



10 



9. The method of claim 7, wherein the articulation frequencies correspc 

• ■ • * 

appn>ximately to a Speed of human articuU^^ 

10. . The method of claim 7, whereio the non-articulatipn frequeiici^ are 
approximately greater than the articulation frequencies. 



^nd 



11. The method of claim 7» wherein the comparison between the articutetioii power 
IS and non-articulation powjcr is a ratio betweeii the ai^ 

articiilatibn power. 



12. The method of claim 10, whmin die ratio includes 

die numerator including the aiticulation power and a sinall constemt, the 
20 dmominator including the npn-surticulation power plus die small constant 



25 



1 3. The method of plmm 7, wherein die comparison betwem the orticulatioii power 
and.non-aiticuladoii power is a difference between the articulation power and non- 

• * * • 

articulalimi power. 

• ■ 

* * 

14. . The method pf claim 7, wherein the step of assessing the first or second speech 

quality includes.the step of: 

r 

determining a local speech quality using the comparisoiL 



30 13. ThQ method of claim % wherein die local speech quality is fiirdier detennipcd 

using a weighing factor based on a DC-comppnent power. 
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16.:. The in^UiO!^ of claim 9, wherein the first or second spe^ 
using the local speech quality. 

S 17. The method of claim 7, wherein the step of comparing articulation power.and non* 
articulation power includes die step of: 

perfpiining a Fourier transform on each of a plurality of envelopes 
obtained &onk a plurality of critical band si^^ 

* ^ ■ * 

' ■ • • • 

10 18. The method of claim 7/ wherein the ^tep of cpmpanx|ig articulation power and non^ 
arficulation power includes dtie step! of: 

filtering the speech sigrml to obtain a plurality of critical band signals. 

• • • " • • - " • . 

1 9. The method of claim 1 8» wherein the step of comparing articulatipn power and 
IS non-articulatiQir lM>wer includes the step of: 

perfornimg an envelope suialysis on th$ plu^ 
obtain a pluraUty of modidation spectrums. 



20. The method of claim 18, wherein the step of comparing articulation power and 

- • • . - ' • . '■ 

ZO non-aiticulation power includes the step o£ 

perforining a Fourier transform on each of 
" spectrums. 
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MEliiOP OF REliXECtlNG TIMM^ANOVAGE PISTOR^^ 
OBJECTIVE SPEECH QUALITY ASSESSMENT 

) 

Field of the Invention 

S The present invention relates geaeraily to coniQiunications systems and, in 

particular, to speech quality assessnjent. 

ft 

Background of the Related Art 

Perform^ce of a wireles$ conununication system can be measured, 

10 among other things,, in terms of speech quality. In the current art, there are two 

techniques of speech quality assessmeut The first technique is a subjective technique 
(hereinafter referred to as "subjectiye speech quality assessment' hi subjective speech 
quality assessment, bunian listeners are typically used to ra 
processed speech, wherein processed speech is a transmitted speech signal which has 

1 S been prjpcessed at the receiver. This technique is subjective because it is based on 

percq)tion of ^e individual human, and human assessment of speech quality 'by native 
listeners, i.e., people that speak the language of ^e speech material being presented or 
listened, typically takes into account language effects. Studies have shown that a 
listener's knowledge pf language affects the scores in subjective listening tests. Scores 

20 given by native listeners were lower in subjective listening tests compared to scores given 
by non-native listeners when language information in speech is defect, i.e., mute. In a 
normal telephone conversation, the listener is often a native listener. Thus, it is 
preferable to use native listeners for subjective speech quality assessment in order to 
emulate ^ical conditions. Subjective speech quality assessnient techniques provide a 

m 

25 good assessnient of speech quality but can be expensive and time consuming. 

' ' ' The second technique is an objective technique (hereinsifter refei^ 
"'objective speech quality as§essmenf 0- Objective speech quality assessment is not based 
on the perception of the individual human. Some objective speech quality assessment 
■techniques arc based on known source speech or reconstructed source speech ^timated 

30 from processed speech. Other objective speech quality assessment techniques are not 
based on Igiown source speech but on processed speech only. These latter techniques are 
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. referred to hereio as '-singl^^ended pbjective speech quality assessment -techniques^ and 
are often used when known source speech or reconstructed source speech are unavailable^ 

Current single-ended objective speech quality assessin.ent techniques, 
however, do not provide as good an assessment of speech quality compared to subjective 
§ speech quality assessment techniques. One reason why current single-ended objective 
speech qiiality assessment techniques are iiot as good as subjective speech quality 
assessment techniques is because the former techniques do not accouiit for iaiiguage 
effects. . Current single-ended objective speech quality assessment techniques have been 

• — 

unable to account for language effects ip its speech assessment. 
10 Accordingly, there exists a need for a single-ended objective speech 

quality assessment technique which accounts for language effects in assessing speech 
quality. 

Summary of the Invention 

15 The present invention is an objective speech quality assessment technique 

•■ * • 

that reflects the impact of distortions which can dominate overall speech quality 
assessment by modeling the impact of such distortions on subjective speech quality 
assessment, thereby, accounting for lan^age effects in objective speech quality 
assessnjient. In one embodiment, d^e objective speech qiiaUty assessment technique o 
20 present invention comprises the steps of detecting distortions in an interval of speech 
activity using envelope information, aqd modifying an objective speech quality 
assessment value associs^ted with the speech activity to reflect the impact of the 
distortions on subjective speech quality assessment. In one embodiment, the objective 

* 

speech quality assessment technique also distinguish types of distortions, such as short 
25 bursts, abrupt stops and abn^t starts, and modifies the objective speech quality 

assessment values to reflect the diff^rrat impacts of each type of distortion on subjective 
speech quality assessrnent. 
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Brief Description of the Drawings 

The features, aspects, and advantages of the present invention will become 
better understood with regard to the following descriptioii, appended clainis, ^d 
accompaiiying (Irawings where: 
S Fig. 1 depicts a flowchart illustrating an objective speech quality assessment . 

technique accounting fpr language effects in QccOT 
present invention; 

< 

Fig, 2 depicts a flowchart illustrating a voice activity detector (VAD) which 
detects voice activity by examining envelope information associated with the speech 
1 0 signal in accordance with one embodiment of the present invention; 

Fig. 3 depicts an example VAD activity diagram illustrating intervals T and G of 
speech and non-speech activities, respectively; 

Fig. 4 depicts g flowchart illustrating ari embodiment fpr determining whether 
speech activity is a short burst or impulsive noise and for modifying objective speech 
1 S frame quality assessment Vs(m) when a short burst or impulsive noise is determined; 

Fig. S depicts a flowchart illustrating an embodimjent for determining whethei' 
speech activity has an abrupt stop or mute and for modifying objective speech frame 
quality assessment Vj(m) when it is determined that such speech activity has an abrupt 
stop or mute; and 

20 Fig. 6 depicts a flowchart illustrating an embodin^ent for determining whether 

speech activity has an abrupt start and for modifying objective speech Grsaim quality 
assessment Vs(m) when it is determined that such speech activity has an abrupt start. 

Detailed Description 

25 The present invention is an objective speech quality asseSsmrat technique 

that reflects the impact of distortions which can don^iiate overall speech quality 
assessment by modeling the impact of such distortions on subjective speech quality 
assessment, thereby, accounting for language effects in objective speech quality 
assessment. 

30 Fig. 1 depicts a flowchart 100 illustrating an objective speech quality 

assessment technique accounting language effects in accordance with one embodiment of 
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the present invention. In step 102, speech signal s(n) is jprQcessed to detmnine objective 
speech frame quality assessment Ys(m)^ i.e,, objective quality of speech at frame m. In 
one embodiment, each fiame m corresponds to a 64 ms interval. The maniier of 
processing 9 speech signal s(n) to obtain objective speech frame quality assessment Vs(m) 
5 (which dp not accpimt for language effects) is welUkuown. in the art, One example of 
such processing is described in co-pending application serial number 10/186,862, entitled 
"'Comp^sation Of Utterance-Dependent Articulation For Speech Quality Assessment'- , 
filed on July 01, 2002 by inventor Poh-Suk Kim, attached herein as Appendix A* 

In step 105, speech si^al s(n) is analyzed for voice activity by, fpr 

10 example, a voice activity detector (V AD). VADs ^e well-known in the art. Fig. 2 
depicts a flowchart 200 illtistrating a VAD which detects voice activity by examining 
envelope information associated with the speech signa) in accordance with one 
embodiment of the present invention. In step 205, envelope signals yk(n) are summed up 
for eAl cochlesir chanaels k to form summed envelope signal 'Y(n) in accordance with 

15 equation (1): 

rW-Z^*^^'') equation (1) 

where (n) = y}slin)-k-sl{n) , n represents a time index, Net represents a total number of 

critical bands, su(n) represents the output of speech signal s(n) through cochlear channel 
A, i.e., (n) = s(n) * (n) , and f ^ (n) is the Hilbert trapsfoim of sjt(n). 

20 In stq> 210, a frame envelope efl) is computed every 2 ms by multiplying 

summed envelope signal :((n) with a 4 ms Hamming window w(n) in accordance with 
equation (2): 



<0 - /^«[^i]/'^(^)^«) + 1 j equation (2) 



where y^^\n) is the 2 ms /-th frame signal of the summed envelope ^ign^l y(n). It should 

25 be understood diat the durations of the frame envelope e(l) and Hamming window w(n) 
are merely illustrative and that other durations are possible. In step 215, a flooring 
operation is applied to frame envelope e(l) in accordance with equation (3). 
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eil) ^ 



€(/) ire(/)>5 . 

equation (3) 

5 otherwise 



In step 220, tim^ derivative Ae(/) of floored ffmie envelope ^(l) is obtained in 
acconH^nce with equation (4). 

Ae(/) = ^"^^ . equation (4) 

5 where ^3^'$3. 

In step 225, voice activity detection is performed in accordance with 

equation (5). 

fl i/e(0>5 
vad(J)^< Muation(5) 
[0 otherwise 

In step 230, the result of equation (S), i.e., yad(l)y can then be refined based on the 
10 duration of 1 's and Q's iti the output. For example, if the duration of &s in vadfl) is * 
shorter than 8 tns, then vadfl) shall be changed to 1 's for that duration. Similarly, if the 
duration of 1 in vadfl) is shorter than 8 ms, the vadfl) shall be changed to Q's for that 
duration. Fig, 3 depicts an example VAD activity diagram 30 illustrating intervals T and 
G of speech and pon^speech activities, respectively. It should be understood that speech 
IS activities associated with intervals T may include, for example, actual speech, data or 
noise. 

Returning to flowchart IQC of Fig. 1, upon analyzing speech signal s(n) for 
speech activity, interval T is examined to determine whether the associated speech 
activity corresponds to a short burst or impulsive noise in step 110. If the speech activity 
20 in interval T is determined to be a short burst or impulsive noise, then objective speech 
frame quality assessQient Vs(m) is modified in step 1 15 to obtain a modified objective 
speech firame quality assessment t$(m) . The modified objective speech firame quality 

assessment mm) accounts for the effects of short burst or impulsive noise by modeling 

or simulatiQg the impact of short bursts or impulsive noise on subjective speech quality 
25 assessment. 
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From step 1 15 pf if in step 1 10 the speech activity in interval T is not 
detemiined to be a short burst or impulsive noise, then flowchart 100 proceeds to step 
120 where the speech activity in mterviil T is examined to determine whether it has an 
abrupt stop or mute. If the speech activity in interval T is determined to have an abrupt 
S stop or mute, then objective speech fi^m^ quality assessment Vs(m) is modified in step 
125 to obtain a modified objective speech fiame quality assessment ^m) . The modij[ied 

objective speech franie quality assessment ?^m) accounts for the effects of the abrupt 

stop or mute by modeling or simulating the impact of an abrupt stop or mute and 
subsequent release on subjective speech quality assessment. 

10 ^ From step 1 25 or if in step 1 20 the speech activity in inta^al T is not 

determined to have an abrupt stop or mute, then flowch^ 100 proceeds to step 130 . 
where the speech activity in interval T is examined to determine whether it has an abrupt 
Start. If the speech activity in interval T is determined to have an abrupt start, then 
objective speech firame quality assessment Vs(m) is niodified in step 135 to obtain a * 

15 modified objective speech firame quality assessment ftj<m) . The objectivie speech fi^e 

quality assessment ys(f^) accounts for the effects of the abrupt start by modeling pr 
simulating the impact of an abrupt start on subjective speech quality assessment. From 
step 135 or if in step 130 the speech activity in interval T is pot determined to have an 
abrupt start, then flowchart 100 proceeds to step 145 where the results of modifications to 

20 objective speech fiame quality assessment ys(m)^ if any, are integrated into the original 
objective speech frame quality assessment ys(m) pf step 102. 

Techniques for determining whether speech activity is a short biirst (pr 
impulsive noise) or has an abrupt stop (or mute) or an abrupt start, i.e., steps 1 10, 120 and 
130, along with techniques for modifying objective speech frame quality assessment 

25 Vs(m), i.e., steps 115, 125 and 135, in accordance with one eoibodiment of the invention 
will now be described. Fig. 4 depicts a flowchart 400 illustrating an embodiment for 
determining whefher speech activity is a short burst or impulsive noise and for modifying 
objective speech frame quality assessment Vs(m) when a short biirst pr impulsive npise is 
determined. In step 405, an impulsive noise frame // is determined by finding a frame / in 

30 interval T, where frame envelope e(l) is maximum in accordance, for example, with 
equation (6): 
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// = arg max e{l) equation (6) 

where Ui and di represents frames / at the beginning and end of interval T,-, respectively. 
In step 410, frame envelope e(l^ is compared to a listener threshold valu^ indicating 
whether a human listener can consider the corresponding frame // as annoying short burst. 
5 In one embodiment, the listener threshold value is 8 — that is, in step 410, eflf) is checked 
to determine whether it is greater than 8. )f frame envelppe e(li) is not greater than the 
listener threshold value, then in step 415 the speech activity is determined not to be a 
short burst or impulsive noise. 

If frame envelope efl^ is greater than tjbe listener threshold Value, then in 

10 $tep 420 the duration of interval T/ is chepked to determine whether it satisfies both a 
short burst threshold value and a perception threshold value. That is, interval T. is being 
checked to detennine whether interval T/ is not too short to be perceived by a human 
listener and not top long to be categorized as a short burst, In one embodiinent, if the 
duration of interval J/ is greater &an or equal to 28 tns and less than or equal to 60 ms, 

1 S i,e., 28:^i^0, then boti) of the threshold values of step 420 a^ satisfied. Otherwise the 
tlureshold values of step 420 are not satisfied. If the threshold values of step 420 ar^ not 
satisfied, then in step 42S the speech activity is detennined not to be a short burst or 
impulsive noiSe. 

. If the tl^eshoid values 6f step 420 ^ satisfied, then in step 430 a 
20 maxinium delta frame envelope Ae(l) is detennined from the frame envelopes e(l) in the 
one or more frames prior tp the beginnilig pf interval T/ through the first one or more 
frames of int^al T; and subsequ^tly compared to an abrupt change threshold value, 
such as 0^5. The abrupt change threshold value representing a criteria for identifying an 
abrupt change in the franie envelppe. In one embodiment, a maximum delta fran^e 
25 envelope ^e(l) is determined from frame envelope e(uiA)y i.e., fr^e envelope 

immediately preceding interval T;, through the frame envelope e(ui^5X i.e., fifth firame 
envelope in interval Ti, and cpnipared to a threshold value of 0.25 - that is, in step 430, it 
is checked to determine whether equation (7) is satisfied: 

m^ ^Ae(/)>0.25 equation (7) 
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If the maximum delta ficatne envelope ^e(l) does not exceed tfie thresbpld value, theii in 
$tep 435 the speech activity is detentiined not to be a short burst or impulsive noise. 

If the maximum delta frame envelope Aefl) does exceed the thresbpld 
value, then in ^tep 440 it is determined whether frame m/ would be sufficiently annoying 
5 to a human listener, where mi corresponds to the frame m which is impacted most by 
impulsive noise frame //. Tst one embodiment, step 440 is achieved by determining 
whether a ratip of objective speech frame quality assessment Vs(mi} to modulation noise 
reference unit Vq(m^ exceeds a noise threshold value. Step 440 may be expressed, for 
example^ using a noise threshold value pf 1 . 1 and equation (8): 

10 ^^^<L1 eqwtion(8) 

V,(ffl,) 

wherein if equation (8) is satisfied, ft would be deterqiiped that fjime mi has sufffciefit 
annoyance to a human listener. If it is deterrnined that objective speech frame quality 
assessment Vs(mi) would be sufficiently aqnoying to a human listener, then in step 445 the 
speech activity is determined not to be a short burst or iippulsive noise, 

15 If it is determined that objective speech frame quality assessment v/mi) 

wpyld not be sufiiciently annoying to a human listener, then in step 450 conditions 
related to the durations of intervals G,.|^, G^h-w Tm and/or Th-i satisfying certain 
minimum or maximum duration threshold values are checked to verify that it belong to 
human speech. In one embodiment^ the conditions of step 450 are expressed as equations. 

20 (9) and (10). 

GiA^ < 1 80 ms and h^i > 40 ms and T,,i > 50 ms equation (9) 
GiAj> 40 ms and G,,i+i < 100 nas and T^y > 60 ms equation (10) 
If any pf these equations or conditions are satisfied^ then in step 455 the speech activity is 
determined not to be a short burst pr impulsive npise. Rather the speech activity is 

25 determined to be natural speech; It should be understood that the minimum and . 

maximum duration threshold values used in equations (9) and (10) are merely illustrative 
^ and may be different. 

If none of the conditions in step 450 are satisfied, then in step 460 
objective speech ffstne quality assessment Vs(m) is modified in accordance with equation 

30 11: 
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v,(m) 



1+ exp[-8.2(m J/ ) - 10] 



equation (U) 



Fig. S depicts a flowchart SOQ illusdrating an embodiment for determining 
whether speech activity has an abrupt stop or mute and for modifying objective speech 
frpme quality assessment Vs(m) wh^n it is determined that such speech activity hsis an 
5 abmpt stop or mute. In step S0!$, abn^t stop fiame /a/ is determined. The abrupt stop 
frame Im is determined by first finding negative peaks of delta Q;ame envelope Ae(?^ in the 
speech activity using all frames / in interval T,% Delta frame envelope Ae(l) has a 
negative peak at / if Ae(l) < Aefl^j) for 3 <j ^ 3. Upon finding the negative peaks, abrupt 
stop frame l^f is determined ais the minimum of die negative peaks of delta frame 
10 envelopes Ae(l). In step 510, delta frame envelope AeflM) ^ checked to determined 
whether an abrupt stop threshold value is satisfied. The abrupt stop threshold 

• • ' * 

representing a criteria for detennining whether there was sufficient negative change in 
frame envelope from one frame / to another frame /+1 to be considered an abrupt stQp. In 
. one embodiment, the abrupt stop threshold value is -0.56 and step 5 1 0 may be expressed 
15 as equation (12): 

M^Af ) < -0-56 equation (12) 

If delta fi^e envelope Ae flM) does not satisfy the abrupt stop threshold value, then in 
step 5 1 5 the speech activity is determined not to have an abrupt stop or mute. 

If delta frame envelope Aefli^ does satisfy the abrupt stop threshold value, 

20 then in step 520 interval Tj is checked to detennine if the speech activity is of sufficient 
duration, e.g., longer than a short burst. In one embodiment, the duration of interval T/ i$ 
checked to see if it exceeds the duration threshold value, e.g., 60 ms. That is, if T/ < 60 
ms, then the speech activity associated with interval T, is not of sufficient duration. If the 
speech activity is considered not of sufficient duration, then in step 525 the speech 

25 activity is determined not to have an abrupt stop or mute. 

If the speech activity is considered of sufficient duration, then in step 530 
a maximum fi:ame envelope e(l) is determined for one or more frames prior to frame Im 
through firame /a/ or beyond and subsequently compared against a stop-energy threshold 
value. The stop-energy threshold value representing a criteria for determining whether a 

30 frame ^velope has syfficient energy prior to muting. In one embodiment, maximum 
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frame envelope efl) is detemiined for franjes l^^l through Im and compared to a stop- 
energy threshold value of 9.5, i.e., max e(/) > 9,5 . If the maximum frame envelope e(l) 

doe^ not satisfy the stop-energy threshold value, tfien in step ^35 the speech activity i$ 
determined not to have an abrupt stop or mute. 

If the maximum frame envelope e(l) does §ati$jfy the stop-energy threshold 
value, then objective speech frame quality assessment Vs(m) is modified in accordance 
with equation 13 for several frames m, such as mM,...,mAf^6: 



«K/»)=^N/m)| 



-6 



equation (13) 



[1 + exp[-2(m - -3] 

where ntja corresponds to the frame m which is impacted most by abrupt stop frame Im, 
10 Fig. 6 depicts a flowchart 600 iUustmting an embodiment for determining 

whether speech activity has an abrupt start and for modifying objective speech frame 
quality- assessm^t Vsfm) when it is determined that such speech activity has an abrupt 
start In step 605, abrupt start frame /s is determined. The abrupt start frame is 
determined by first finding positive peaks of delta frame envelope ^efl) in the speech 

♦ ' ■ 

15 activity using ^11 frame$ / in interval Tj. Delta frame envelope /S^efl) has a positive peak at 
/ ifAefl) > ^€(l^j) for 3 <y < 3. Upon finding the positive peaks, abrupt start frame 1$ is 
determined as the maximum of the positive peaks of delta frame envelopes Ae(l). In step 
610, delta fraine envelope A^^^ is checked to detemiined whether an abrupt start 
threshold value is satisfied. The abmpt start thre$hold representing a criteria for 

20 detomining whether there was sufficient positive change in frame envelope from one 
frame / to another fraine /+1 to be considered an abrupt start. In one embodimeiit, the 
abrupt stop threshold value is 0.9 and step 610 may be expressed as equation (14): 

M^s)>0-9 equation (14) 

If delta frame envelope Aefls) does not satisfy the abrupt start threshold value, then in 
25 step 61 5 the speech activity is determined not to have an abrupt start; 

If delta fraine envelop^ Ae(l^ does satisfy the abmpt $tart threshold value, 
then in step 620 interval T,- is checked to determined if the speech activity is of sufficient 
duration, e.g., longer than a short burst In one embodiment, the duration of interval Ti is 
checked to see if it exceeds the short buist threshold value, e.g., 60 ips. That is, if T/ < 60 
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m^, then the speech activity associated with interval T/ is not of sufficient duration. If th^ 
speech activity is not of sufficient du^tion, then in step 62S the speech activity is - 
determined not to have an abrupt start. 

If the speech activity is of sufficient dmation, then in stqp 630 a oiaximum 
5 firqme envelope efl) is determined for frame 1$ or prior through one or more frames after 
frame Is and subsequently compared against a start-energy threshold value. The starts 
^ergy threshold value representing a criieria for determining vi^hether a frame eiivelope 
has suf!icient energy. In one ^bpdiment, maximum frame envelope e(l) is determined 
for frames h throu^ 1$ +7 and coinpared to a start-energy threshold value of 12, i.e., 
10 max e(/)<12. If the mapcimum fram^ envelope e(7^ d<>e$ not satisfy the start-eaergy 

threshold value, then in step 635 the speech activity is determined not tohsive an abrupt 
start. 

If the maximuin frame envelope eO) does satisfy the start-energy threshold 
value, then objective speech frame quality assessment Vs(m) is modified in accordance 
15 with equation 16 for several frames m, such as rnxf, ...,mA#+6: 

mm)^- > ""^'^^ / T equation (16) 

^ l+exp[-0.4(/«-m5)/M/^)^10] ^ ^ / 

where m$ corresponds to the frame m which is impacted most by abrupt start frame Z^. 
it should be understood that the values used in equations (1 1), (13) and (16) were derived 
empirically. Other values are possible. Thus, the present invention should not be limited 
20 tp those specific values. 

Note that upon determining modified objective speech frame quality 
assessment *^(/w), the integration performed in step 145 may be achieved using equation 

(17): 

v,(m) = min(v,,(m),v,^(w),v,^(w)) equation(17) 

25 where Vs,i(m), VsM^) Vg^^m) correspond to tiie modified objective speech frame 
quality assessment Vj(m)of equations 11, 13 and 16, respectively. 

Although the present invention has been described in considerable detail 
with reference to certain embodiments, other versions are possible, For example, the 
orders of the steps in the flowcharts may be re-arranged, or some steps (or criteria) may 
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be deleted from or ^dded to the flowcharts. Tberefpre, the spirit and scope of tjie present 
inventipp should not be limited to the description of the embodiments contained herein. 
It should also be understood to those skilled iq the art that the present invention may be 
implemented either as hardware or soflwai:e incorporated into some type of processor. 
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Cljaims 
I claim: 

1 . A method for objeptively assessing speech quality coinpri$ing the steps of: 

detecting distortions in an interval of speech^ activity using envelope 
5 information; and 

modi^dng an objective speech quality assessment value associated with 
the speech activity tp reflect the impact of the distortions on subjective speech 
quality assessment. 

10 2. The method of claim 1, wherein the step of modifying includes the step of 
detemiining the objective speech quality assessment values for the speech 
activity, . . 

3. The method of claim 1, wherein the distortions being detected ar^ impulsive noise, 
15 abrupt stop or abrupt start. 

4. The me^od of claim 1, wherein the step of detecting includes the step of 
determining a distortion type. 

20 5. The method of claim 4, wherein the distortion type is determined to be impulsive 
noise if the envelope information indicates that the speech activity can be 
perceived by a hiunan listener to be noise and if the interval is of a duration long 
enough to be perceived by a hunian listener but not.top long for a short burst. 

25 6. The method of claim 4^ wherein the distortion type is determined to be impulsive 
noise if the envelope information indicates that the speech activity can be 
perceived by a human listener tp be noise, if a ratio of the objective speech quality 
assessment value tp a modulation noise reference unit indicates a human listener 
would perceive annoying noise, and if the interval is of a duration long enough to 

30 be perceived by a human listener but not too Ipng for a short burst. 
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7. The method of claim 4, wherein the objective quality assessment value associated 
with the speech activity is modified in accordance with the following equation to 
obtain a modified objective quahty assessment value if the distortion type is 
impulsive noise: 

5 mm) = ^ . 

• l+exp[-8.2(m-/ii;)/e(/,)^10] 

where Vs(m) i$ the objective quality assessment value and ^(m) is the qiodified 
objective quality assessment value. 

8. The method of chim 4, wherein the distortion type is determined to be abrupt stop 
10 if die ^velope information indicates that there was an sufficient negative change 

in frame energy fironi one frame to anoth^ to be considered an abrupt stop and if 
the interval is of a duration longer than a short bu^ 

9. The method of claim 4, wherein the distortion type is deternuned to be abrupt stop 
1 S if the envelope information indicates that a maximum frame envelope had 

sufficient energy prior to ending the interval, and if the interval is of a duration 
longer than a short burst. 

* 

10. The meUiod of claim 4, wherein the objective quality assessment value associated 
20 with the speech activity is modified in accordance with the following equation to 

obtain a modified objective quaUty assessment v^iie if the distortion type is 
impulsive noise: 



^ l>K«?)^|Ae(/^)| 



--6 



1 + exp [-2(m - w^, - 3] 

where Vs(m) is the objective quality assessment vsdue and ^/n) is the modified 
25 objective quality assessment value. 

1 1 . The method of claim 4, wherein the distortion type is determined to be abrupt start 
if the envelope information indicates that there was an sufficient positive.change 
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in frame energy from one gajQie to another to be CQnsi(}ered an abrupt start and if 
the interval is of a duration longer than a sbort bur$t 

* 4 

1 2. The method of claim 4, wherein the distortion type is determined to be abrupt stpp 
5 . if the envelope information indicates that a maximum frame envelope had 

sufricient ener^ towards a beginning of the interval, and if tfoe interval is of a 
duration longer than a short burst. 

1 3. The method of claim 4, wherein the objective quality assessment Value associated 
10 with the speech activity is modified in acQordance with the following equation to 

... ^ 

* w 

obtain a modified objective quality assessment value if the distortion type is 
impulsive npisei 

v(m) 



1 + e35p[-0.4(m - OTj) / Ae(/s) - 10] 

• ■ 

where Vs(m) is the objective quality assessment value and is the modified 
IS objective quality 9$sessinent value. 

14. The method of claim 1 comprising the additional step of: 

prior to the step of detecting, determining the interval of speech activity 
using the envelope information. . 

20 

15. An objective speech quality assessment system comprising: 

means for detecting distortions in an interval of speech ^tivity using 
envelope information; and 

♦ ■ • 

means fpr modifying an objective speech quality assessment valu<& 
2S associated with the speech activity to reflect the impact of the distortions on 

subjective speech quality assessment. 

16. The objective speech quality assessment system of claim IS^ wherein the means 
for modifying includes a means for determining the objective speech quality 

30 assessment values without accounting for distortions for the speech activity. 
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17. The objective speech quality assessment system of claim 1 5, wherein the 
distortions being detected are impulsive noise, abrupt stop or abrupt start. 

■ 

18. The objective speech quality assessment system of claiiq 15, wherein the roean$ 
for detecting includes a means for dete^rinining a distortion type. 

■ 

19. The objective speech quality assessment system of claim 1 8, wberein ^e means 
for detecting includes a voice activity detector for detecting iiitervals of speech 
activity, wherein the me^ for determining a distortion type examines intervals of 
speech activities detected by the voice activity detector. - 
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Abstract of the Disclosure 

Disclosed is an objective speech quality a$sessment technique that reflects 
the impact of distortions which can dominate overall speech quality assessment by . 
.modeling the impact of such distortions gn subjective speech quality a$$essment, thereby, 
accounting for language effects in objective speech quality assessment 
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CQMFSNSATION FOR IJTTERANCE DEPENDENT ARTICUI^TtON 

FOR SPEECH QUALITY ASSESSMENT 

• • • « 

* • * * • 

* ■ • 

Field of the Inventioq 

S . . The present invention relates generally tn commnt^j^fiipns gystfms and, in 

particular, to sp^h quality assessment . 

• * * ' • 

Bacjkground of Ae Related Art 

PqfoFmance of a wireies$ commuiufsation syst^ 
iO among other things* ini tmm In the cunent art, diere are tvifo 

techniques of speech qiudity assessment The first technique is a subjecdve te^^ 
(hminafkar referred to as '^subjective speech quality assessment*), in subjective spiwcll 
qua^ty assessment, hunm Usteners a^ 

speech, wherein processed speech i$ a transinitted speecb signal which has been 
15 . processed at due rcceiy^. Hiis technique is subjective because it is based on tibie 

perception of the individual human, and huinan assessment of speetihi qualify typic^lOy 
takes into account phonetic contents, speaking stjdes or individual qieato dififerences. 

Subjective speech q^aUty asss^^ 

- • -'-..* 

The sec^od techmqw 

20 **objective gpeech quality assessinenf Objective speech quality assessment is not based 
on the percepticm of the iiidiyidual human. Mostobjective speech quali^ assessment 
techniques are based on known source speech or reconstructed source speech estimated 
from processed speech. However, these objective techniques do not account for phonetic 
contrats, speaking styles i» tndivid^ 

25 Aca)cdi]igly,diere^stsaneed 
which takes iioto account phonetic contents, speaU 
di^erences. 

Siimmflrv of the Invention 

30 . The present invention is a method for objective speech qi^ 

that accounts for phonetic coitfients, speaking styles or individual speaker di^erences by 
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distortmg speech signals under speech quality assessment. By u^ing a distorted veision 

• • • - 

of a speech signal, it is possible to compensate for different phonetic contents, dififer^t 
individual speakers and different speaking styles when assessing speech quality. The 
amount of degrsulation in die objective, speech quality assessment by distorting the 
5 speech signal is maintained similariy for different speech signals, especially when the 
aoaount of distortion of ^e distorted version of speech signal is severe. Objective speech 

' • * ' • * * 

■ * p ■ 

qufdity assessment for the distorted speech signal and the original undistorted speech 
signal are compared to obtain a speech quality assessnimt compensated for uttmnce 
dependent articulation; In one enqbodinient, the comparison coir^ 
10 between the objective speech quality assessments for the clistdrted iand undistorted speech 

* • 

signals. 

» • 

• • ■ * . 

The features, aspects, and advantages of the present invention will become 
15 ; better understood with regard to the fpU^ 
accompanying drawings where: 

Fig. 1 depicts an objective speech quaUty assessment anangepient which 
compensates for utterance depepdent articulation in accordance with the present 
invention; 

20 Fig. 2 depicts an embodiment of an objective .speech quality assessment module 

employing an auditoiy^articulatory analysi 
invention,; 

Fig. 3 depicts a flowchart for processing, in an articulatory analysis module, the 
plurality of envelopes ai(0 in accordance with one embodiment of the invention; and 
ZS Fig. 4 depicts an example illustrating a n^odulatipu spectruip Ai(ni,0 in terms of 

power vjersus firequency. 

• ■ • * 

* • 

Dct2uled Description 

r ■ • * * 

The piesent invention is a method for objective speech quality assessment 
30 that accounts for phonetic contents, speaking styles or individual speaker differences by 
distorting processed speech. Objective speech quality assessment tend to yield different 

t * • • t 
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vahies for different Speech 

reason these values differ is because of diffmnt distribution$ of spectral contents in die 
modulation spectral domm . . 

is possible to compensate for different phonetic contents^ different individual speakers 
5 - and different speaking styles. The amount of degradatioii ii| tibe objective speedi quality 
. assessment by distorting the speech signal is maintained similarly for different speech 
signals, especiaUy when the distortion is severe/ Objective spe^ 
the distorted speech signal and the original undistorted speech signal are compared to 

• ■ • • # « 

obtain a speech quality assessment compemated for utterance 
10 Fig. 1 depicts an objective speech quality assessment airangement 10 

which conipensate^ for utterance dependent articulatioii in ac^sordanca witti the presoit 
inventioii. . Objective speech quality iassessment arrangem«t 1 0 comprises a plurality of 
obj^ve speech quality assessment modules 12, 14, a distprtipii module 16 and a 
QompensatioQ utterance^sp^ific bias module 18. Speech signal s(t) is piQvided a$ inputs 
IS to distprtion module 16 md objective speech quality assessment module 12. In diirtpitioii 
modide 16, speech sigi^ s(t) is distorted to produpe a m<^^ 
(MNRU) speeph signal sXO* In other words, distortion module 16 produces a noisy 
version of input signal s(t). MNRU speech signal s'(t) is then provided ias iiqput to 
objective speech quality as$essmeiit module 14. 
20 In objective speech quality assessinent modules 12» 14, speech signal s(t) 

and MNRU speech sigiial sXt) are processed to obtain objectiv 

m 

assessments SQ(s(t) and $.Q(s'(t)). Objective speech quality assessmotf modules 12, 14 . 

* » • * * 

are essentially identical in temis of die type of processing 

signals. That is» if both objective ^>eech quality assessment modules 12, 14 receive the. 

25 same input q)eedi signal, the ou^ut signals of both modules 1 2, 14 would be 

• . » • 

approximately identical. Note that, in other embodiments, objective speech quality 

rnay process speeich signals s(t) and s'(t) iii a manner difieient 
from each other. Objective speech quality assessment modules are well-h)iown in the art. 

An example of such a module will be described later herein. 
30 Objective spee<;h quality assessments.SQ(s(t).andS(^sXt)) are t^ 

compared to obtain speech quality assessment SQcompensated^ which compensates for 

• ... . 
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Utterance dependent aiticulatioiL bi one embodiment, speech quality assessmei^t 
SQccaopepsitfld js detennined using the difference between objective speech quality 
«ssessinem5 SQ(s(t) aQd SQ(s'(0). For ej?ample,.SQcompeBs««i is equal to SQ(s(t) mimis 
SQ(s'(t)), or vice-versa. In another embodiment, speech quality assessment SQconvdaaed 
5 is determined based on a ratio between objective speech quality assess^ 
.SQ(s'(t)). For example, 

SO _ SQ(s(t)>f>i . _ SQ(s'(t)>»-n 

•"'^'^ SQ(s'(t))+H """^ SQ(s(t))+|* 

v4i^ ^ i$ a sinall constant value; 

As mratibned eariier, objective speech quality as^ssiment modules 12, 14 

.10 are well known in the art. Fig. 2 depicts embodiment 20 of an obje^ 

* t * ' • • 

quality assessment module 12, 14 employing an auditoiy-articulatory analysis module in 
accordance with the present invehtion, As shown in Fig. 2> pbjective quality ass^ssniiait 
module 20 comprises of cochlear filterbanK 22, envelope analysis module 24 apd - 
articulatoiy analysis module 26, In objective quality assessment module 20, spe^ 

IS si0^ $(t).is ]»rpvi(ted as input to coQ 

comprises a plurality of cochlear filters hi(t) for processing speech signal s(t) in 
accordance with a fir$t ^tage of a peripheral auditory system, where i'^l,2,...,Ne nspresents 
a particular cochlear filter cham^l and Nc denotes the total number of cochlear filter 
channels. SpecificaUy, cochlear filterbwk 22 filters spe^ 

20 plundity of chtic^l band $igiials Si(t)^ 

S(t)*hi(t)/ 

The pluraUly . of critical band signals Si(t) is provided ^ 
analysis module 24. In envel<^ aiialysis module 24, die plurality of critical band signals 



Si(t) is processed to obtain a plurality of envelopes ai(t), wherein a|(t>»^f(^f (t) and 
25 Sj (t) is the jEIilbert transform of Sj (t) . 

The plundity of envelopes ai(t) is then provided as input to articulatory 
analysis module 26. hi articulatory analysis module 26, the plurality of envelopes ai(t) is 
processed to obtain a speech quaUty assessment for speech signal s(t). Specifically, 
articulatory analysis caodule 26 does a comparison of the power associstted with signals 
30 generated firom the humitn articulatory system (hereinafter referred to as "^articulation 

• • • ' . • 
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powcr.PA(m,ir) with the power associated with signals not genciate4 ftoip the human 
aiticulatory system (hereinafter referred. to as '*npn»articulation power P>fA(ni,i)")- Such 
comparison is tfaen.used to make a speech quality 

Fig. 3 depicts a flowchart 300 for processing^ in aiticulatory analysis 
S module 26, the pluraUty of envelopes a((t) in accordance with one e^ 
invention. In stq> 3 10, Fourier transform is perfonned on 
plurality of envelopes ai(t) to produce modulation spectrums Ai(m,Q, where f is 
frequency. : 

• • • 

Fig. 4 depicts an example 40 Ulustrating modttl 

10 terms of power versus frequeiicy . In example 40, articulation power PA(m,i): is the power 
associated with frequencies 2'^12.S Hz, and non-articulation power PNA(in»i) i& tfie power 
assc^ated with frequencies greater than 12.S Hz. Power pNQ(m>i) associated witi^ 
frequencies less than 2 Hz is the pC-fComppnent of frame m of critical band signal ai(t). 
lo this exaim>le, articulation power PACm,!) is chosen as the power associated with 

is frequenqtes 2-^12.5 Hz based on the fiu:t that the speed of htunan artictilation is 2-^12^ 
Hz,' aiid the fiiequmpy raiiges associated with articulation power PA(m»i) and npn- 
afticulation power PNA(Qi,i) (hereinafter referred to respectively a3 "^^rticulatioii 
• frequency range** aiid ''tion*articulation frequency range**) are adj^coit, nQnH)verl^iiig 
frequency rwges. It should be understood that, for purposes of this qyplicatibn, the term 

20 "^articulation power Pa^um)** should not be liniited to the frequency tange of human 
articulation or ^e aforementioned frequency range 2^12.5 Hz. Likewise, the term 
''uon-articulation power PHA(Kn,i)** should not be limited to freq^epcy ranges greater than 
tbje frequency range associated with articulation powe^ The non-articuliatioo 

• • • ' » * • 

frequency range may or may not overlap with or be adjacent to the articulation frequency . 
25 range. The nourarticulation frequency range may also include frequencies less than the 

lowest frequency in the i^ticulation frequency range, such as those associated widi the 

DC^ornponent of frame m of critical b£^^ 

In step 320, for each modulation spectrum Ai(m,0, articulatory analysis 

module 26 performs a comparison between articulation power PA(m«i) and non^ 
30 articulatipn power PHA(in>i). In this embodiment of articulatory analysis inodule 26, the 

comparison between articulation power PA(in,i) and noiV'articulation power PNA(ni,i) is an 
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itfticulatioiirtchnoa-aiticiilatiQn ratio AMR(in,i)- Tk^ ANR is Refined by the fpllowi^g 
equation . 



ANR(m,i> 



equation (1) 



• * * ■ * * 

where t is some small constant vahie. Other comparisons between articulation pow^ 
PA(m»i) and Qon-artictilation power PNA(m,i) are possibly. For example, the comparison 
may be the reciprocal of equation (1), or the comparison may be a diflference between 
articutatipn power PA(Hi>i) and non-articulation power PNA(in»i)- For ease of discussion, 
the embodiment of aiticulatory analysis module 26 depicted by flowchart. 300 will be 
discussed with respect to the comparison using ANR(m,i) of equation (1). This sho\ild 
not, hoWevCT, be constmed to limit the present inveqti^ . 

In step 330, ANK(ni,i) is used tp detenmiiie locd 
for frame m, l^ocal speech quajiQ^ L$Q(m) is determined tisihg m aggre^ite of ttie . 
articttlation-to-npn^articulatipn ratio ANR(m,() across all channels i smd a Weighing 

' • ■ ■ * " 

fiictor R(m,i) bas^ on the PC-component power PMo(m,i). Specifically^ local spejech 

• ■ ■ ■ « 

qiiaUtyl4CK>n) is <lct^iinined using fol^^ 



LSQ(mHog 




equadgp (?) 



where 



R(iii4)= 



log(H-P^(m.i) 

. — .... .11 



Nc 



equation (3) 



Xlog(l:^P^,p(m4c) 



k-l 



and k is a frequency index. 

In step 340» ov^all speech quality SQ for speech signal s(t) is determined 
using local ^>eech quality LSQ(m) and a iog power Ps(m) for frame m. Specifically, 

speech quality SQ is detennmed using the following equation 

'i,'. " ■ " - • . ■ • • . . . • . , . . . , • . 



SQ=l{P.(m)LSQCm)}^,= 



X P.^(rn)LSQHm) 



equation (4) 
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where P,(m)=lpg J^s\t) 

Ltfin . . 



y L is Lp-nofm» T is the total number of frames in speech 



signal s(t)> is any value» and is a threshold for distingmshing betweeq audible signals 
and silence. In one embodiment, X is preferably an odd intege^^ 

The Qutpyt of aiticulatory analysis module 26 is an assessment o 

* 

5 quality SQ over all frames m. That is^ speech quality SQ is a speech quality assessment 
for speech signal s(t). 

« * ■ • . 

Although the present invention has beoi described in considerable detail 
with reference to certain embodiments, other versions are possible. Therefore, the spirit 
and scope of the present invention should not b^ liinited to the description of the 
10 embodinjents contained herein. 



